计算机系统：程序员视角（全球版）：内存的演进：虚拟地址与物理地址

现代处理器使用一种称为 虚拟寻址。虚拟内存（VM）是对主存的一种抽象，为每个进程提供一个私有的、连续的 线性地址空间。

1. 寻址方式的演进

在 物理寻址 （图 9.1）中，CPU 直接向 DRAM 发送一个 物理地址（PA） 。在 虚拟寻址 （图 9.2）中，处理器生成一个 虚拟地址（VA），该地址由 内存管理单元（MMU） 转换为物理地址（PA），然后才访问内存。

2. 层次结构与缓存

DRAM 作为磁盘存储的 DRAM 缓存 。由于磁盘延迟较高，系统采用 写回策略。地址翻译通过 TLB ，利用 TLB 索引（TLBI） 并由如 PROT_WRITE等保护位来保障安全。大地址空间（$N = 2^n$）支持复杂的段以及 操作系统对输入/输出设备的服务。

TERMINALbash — 80x24

> Ready. Click "Run" to execute.

QUESTION 1

[Fill in the blanks] Complete the following table for a Virtual Address Space where $N = 2^n$ and the largest address is $N-1$. Use units: $K=2^{10}, M=2^{20}, G=2^{30}, T=2^{40}, P=2^{50}, E=2^{60}$.

n=16: N=64K, Max=0xFFFF; n=32: N=4G, Max=0xFFFFFFFF; n=48: N=256T, Max=0xFFFFFFFFFFFF; n=64: N=16E, Max=0xFFFFFFFFFFFFFFFF

n=16: N=16K; n=32: N=32M; n=48: N=48G; n=64: N=64T

QUESTION 2

Why does the DRAM cache use a Write-back policy instead of Write-through?

To maintain Stack Discipline.

Due to the massive latency penalty of accessing disk storage.

To prevent internal fragmentation.

Because the L3 unified cache is 16-way associative.

QUESTION 3

What happens if a process attempts to write to a page where the PTE has PROT_WRITE set to 0?

A General protection fault occurs.

The MMU performs Swapping automatically.

Deferred coalescing is triggered.

The Reference bit (A bit) is cleared.

QUESTION 4

Which component is responsible for translating the Virtual Address to a Physical Address?

The L3 unified cache (8 MB, 16-way).

The Memory Management Unit (MMU).

The Rio Package.

The Buddy System Allocator.

QUESTION 5

What is the physical address (PA) if the MMU finds a mapping for a VA where VPN=0x0d and VPO=0x14, given PPN=0x0d?

PA = 0x354: PPN=0x0d, PPO=0x14

$PA = 0x140d$

$PA = 0x0001$

PA = 0x354: VPN=0x0d, VPO=0x14

Case Study: Advanced Heap Management & VM Translation

Optimizing Allocators and Address Translation Efficiency

You are designing a high-performance allocator for an Intel Core i7 system with a k-level page table hierarchy. The allocator uses a single global variable (heap_listp) and must minimize external fragmentation using Buddy System Allocation. You also need to modify the block structure to improve Peak memory utilization.

1. Implement the 'find_fit' function for a simple implicit list allocator using first-fit logic. Your solution must account for Internal fragmentation and return NULL if no block is found.

Solution:
Implementation: static void *find_fit(size_t asize) { void *bp; /* Iterate through the heap starting from heap_listp */ for (bp = heap_listp; GET_SIZE(HDRP(bp)) > 0; bp = NEXT_BLKP(bp)) { /* If block is not allocated and size is sufficient */ if (!GET_ALLOC(HDRP(bp)) && (asize <= GET_SIZE(HDRP(bp)))) { return bp; } } return NULL; /* No fit found */ } This function searches the linear heap space, balancing search speed with memory utilization.

2. Explain how to modify an allocator to remove boundary tags (footers) from allocated blocks. How does this affect Peak memory utilization and the mm_free(void *bp) logic?

Solution:
To remove footers in allocated blocks, we store an 'extra' bit in the header of the *next* block to indicate whether the *current* block is allocated. This allows us to reduce the minimum block size by one word, directly increasing Peak memory utilization by reducing internal fragmentation in small objects. However, mm_free and coalesce(bp) must now inspect the header of the next block to determine if the previous block can be merged, maintaining constant-time coalescing for free blocks (which still require footers).

3. Describe the 'Buddy System Allocation' process for a block of size 32 bytes at address xxx...x10000. How does it handle recursive splitting?

Solution:
The Buddy System finds a block of size $2^j$ where $k \le j \le m$. To allocate 32 bytes ($2^5$), if only a $2^7$ block is available, it splits into two $2^6$ buddies. One $2^6$ block is then split again into two $2^5$ buddies. One buddy at address xxx...x10000 is allocated, while its buddy (address xxx...x11000) remains free. This minimizes external fragmentation and allows for fast buddy-merging using bitwise address operations.